15 research outputs found
A Dual-Perspective Approach to Evaluating Feature Attribution Methods
Feature attribution methods attempt to explain neural network predictions by
identifying relevant features. However, establishing a cohesive framework for
assessing feature attribution remains a challenge. There are several views
through which we can evaluate attributions. One principal lens is to observe
the effect of perturbing attributed features on the model's behavior (i.e.,
faithfulness). While providing useful insights, existing faithfulness
evaluations suffer from shortcomings that we reveal in this paper. In this
work, we propose two new perspectives within the faithfulness paradigm that
reveal intuitive properties: soundness and completeness. Soundness assesses the
degree to which attributed features are truly predictive features, while
completeness examines how well the resulting attribution reveals all the
predictive features. The two perspectives are based on a firm mathematical
foundation and provide quantitative metrics that are computable through
efficient algorithms. We apply these metrics to mainstream attribution methods,
offering a novel lens through which to analyze and compare feature attribution
methods.Comment: 16 pages, 14 figure
Neural Response Interpretation through the Lens of Critical Pathways
Is critical input information encoded in specific sparse pathways within the
neural network? In this work, we discuss the problem of identifying these
critical pathways and subsequently leverage them for interpreting the network's
response to an input. The pruning objective -- selecting the smallest group of
neurons for which the response remains equivalent to the original network --
has been previously proposed for identifying critical pathways. We demonstrate
that sparse pathways derived from pruning do not necessarily encode critical
input information. To ensure sparse pathways include critical fragments of the
encoded input information, we propose pathway selection via neurons'
contribution to the response. We proceed to explain how critical pathways can
reveal critical input features. We prove that pathways selected via neuron
contribution are locally linear (in an L2-ball), a property that we use for
proposing a feature attribution method: "pathway gradient". We validate our
interpretation method using mainstream evaluation experiments. The validation
of pathway gradient interpretation method further confirms that selected
pathways using neuron contributions correspond to critical input features. The
code is publicly available.Comment: Accepted at CVPR 2021 (IEEE/CVF Conference on Computer Vision and
Pattern Recognition
AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments
Feature attribution explains neural network outputs by identifying relevant
input features. How do we know if the identified features are indeed relevant
to the network? This notion is referred to as faithfulness, an essential
property that reflects the alignment between the identified (attributed)
features and the features used by the model. One recent trend to test
faithfulness is to design the data such that we know which input features are
relevant to the label and then train a model on the designed data.
Subsequently, the identified features are evaluated by comparing them with
these designed ground truth features. However, this idea has the underlying
assumption that the neural network learns to use all and only these designed
features, while there is no guarantee that the learning process trains the
network in this way. In this paper, we solve this missing link by explicitly
designing the neural network by manually setting its weights, along with
designing data, so we know precisely which input features in the dataset are
relevant to the designed network. Thus, we can test faithfulness in
AttributionLab, our designed synthetic environment, which serves as a sanity
check and is effective in filtering out attribution methods. If an attribution
method is not faithful in a simple controlled environment, it can be unreliable
in more complex scenarios. Furthermore, the AttributionLab environment serves
as a laboratory for controlled experiments through which we can study feature
attribution methods, identify issues, and suggest potential improvements.Comment: 32 pages including Appendi
CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN
Deep learning models used in medical image analysis are prone to raising
reliability concerns due to their black-box nature. To shed light on these
black-box models, previous works predominantly focus on identifying the
contribution of input features to the diagnosis, i.e., feature attribution. In
this work, we explore counterfactual explanations to identify what patterns the
models rely on for diagnosis. Specifically, we investigate the effect of
changing features within chest X-rays on the classifier's output to understand
its decision mechanism. We leverage a StyleGAN-based approach (StyleEx) to
create counterfactual explanations for chest X-rays by manipulating specific
latent directions in their latent space. In addition, we propose EigenFind to
significantly reduce the computation time of generated explanations. We
clinically evaluate the relevancy of our counterfactual explanations with the
help of radiologists. Our code is publicly available.Comment: Accepted to the ICML 2022 Interpretable Machine Learning in
Healthcare (IMLH) Workshop ----- Project website:
http://github.com/CAMP-eXplain-AI/Style-CheXplai
Longitudinal Quantitative Assessment of COVID-19 Infection Progression from Chest CTs
Chest computed tomography (CT) has played an essential diagnostic role in
assessing patients with COVID-19 by showing disease-specific image features
such as ground-glass opacity and consolidation. Image segmentation methods have
proven to help quantify the disease burden and even help predict the outcome.
The availability of longitudinal CT series may also result in an efficient and
effective method to reliably assess the progression of COVID-19, monitor the
healing process and the response to different therapeutic strategies. In this
paper, we propose a new framework to identify infection at a voxel level
(identification of healthy lung, consolidation, and ground-glass opacity) and
visualize the progression of COVID-19 using sequential low-dose non-contrast CT
scans. In particular, we devise a longitudinal segmentation network that
utilizes the reference scan information to improve the performance of disease
identification. Experimental results on a clinical longitudinal dataset
collected in our institution show the effectiveness of the proposed method
compared to the static deep neural networks for disease quantification.Comment: MICCAI 202
A Survey on Transferability of Adversarial Examples across Deep Neural Networks
The emergence of Deep Neural Networks (DNNs) has revolutionized various
domains, enabling the resolution of complex tasks spanning image recognition,
natural language processing, and scientific problem-solving. However, this
progress has also exposed a concerning vulnerability: adversarial examples.
These crafted inputs, imperceptible to humans, can manipulate machine learning
models into making erroneous predictions, raising concerns for safety-critical
applications. An intriguing property of this phenomenon is the transferability
of adversarial examples, where perturbations crafted for one model can deceive
another, often with a different architecture. This intriguing property enables
"black-box" attacks, circumventing the need for detailed knowledge of the
target model. This survey explores the landscape of the adversarial
transferability of adversarial examples. We categorize existing methodologies
to enhance adversarial transferability and discuss the fundamental principles
guiding each approach. While the predominant body of research primarily
concentrates on image classification, we also extend our discussion to
encompass other vision tasks and beyond. Challenges and future prospects are
discussed, highlighting the importance of fortifying DNNs against adversarial
vulnerabilities in an evolving landscape